EMIT Data is provided in non-orthorectified format to reduce data size. The location group contains latitude and longitude values of each pixel as well as a geometric lookup table (GLT) that can be used to orthocorrect the imagery. The GLT is an array that provides relative downtrack and crosstrack reference locations from the raw scene to facilitate fast projection of the dataset.
This notebook will walk through using the emit_tools module to do the orthorectification process as well as a in depth walkthrough of the process.
If you are planning to convert the EMIT netCDF4 files to .envi format, the reformat.py available in the emit-sds/emit-utils repository can be used to orthorectify the imagery during the conversion process.
Requirements:
emit_tutorials environment as the kernel for this notebook.../data/ folder.Learning Objectives
xarray.DatasetImport the required Python libraries.
# Import Packages
import os
import netCDF4 as nc
from osgeo import gdal
import numpy as np
import xarray as xr
import hvplot.xarray
import holoviews as hv
import sys
sys.path.append('../modules/')
from emit_tools import emit_xarray
Next, set the path to an EMIT dataset as an object. In this example we use an L2A Reflectance file.
fp = '../data/EMIT_L2A_RFL_001_20220903T163129_2224611_012.nc'
Import the emit_xarray function from the emit_tools module and use it to read in the data.
The emit_xarray function will open and and place data from the groups of an EMIT dataset into an xarray.Dataset. It includes an ortho option which is set to True by default. This function reads the root group, location group, and sensor_band_parameters groups from the EMIT dataset. It then flattens all of the variables contained within those 3 groups into a single xarray_dataset. When ortho=True, it uses the GLT layers to build a lat/lon grid and places the values from the reflectance root group into the grid. The lat and lon grid is used as dimensional coordinates in the output xarray.dataset while the wavelengths and fwhm from the sensor_band_parameters group are used as non-dimensional coordinates. The reflectance data from the root group of the netCDF is then orthorectified and added as a variable in the xarray.Dataset. This function also includes an optional quality_mask parameter, which can be used to mask poor quality data. For more on masking, see How to use EMIT Quality Data.
Read in a dataset using the emit_xarray function.
ds = emit_xarray(fp)
ds
<xarray.Dataset>
Dimensions: (latitude: 2009, longitude: 2353, bands: 285)
Coordinates:
* latitude (latitude) float64 -39.31 -39.31 -39.31 ... -40.39 -40.4 -40.4
* longitude (longitude) float64 -62.51 -62.51 -62.51 ... -61.24 -61.24
wavelengths (bands) float32 381.0 388.4 395.8 ... 2.486e+03 2.493e+03
fwhm (bands) float32 8.415 8.415 8.415 8.415 ... 8.806 8.807 8.809
spatial_ref int32 0
Dimensions without coordinates: bands
Data variables:
reflectance (latitude, longitude, bands) float32 nan nan nan ... nan nan
Attributes: (12/38)
ncei_template_version: NCEI_NetCDF_Swath_Template_v2.0
summary: The Earth Surface Mineral Dust Source ...
keywords: Imaging Spectroscopy, minerals, EMIT, ...
Conventions: CF-1.63
sensor: EMIT (Earth Surface Mineral Dust Sourc...
instrument: EMIT
... ...
southernmost_latitude: -40.39610428069674
spatialResolution: 0.000542232520256367
spatial_ref: GEOGCS["WGS 84",DATUM["WGS_1984",SPHER...
geotransform: [-6.25120945e+01 5.42232520e-04 -0.00...
day_night_flag: Day
title: EMIT L2A Surface Reflectance 60 m V001Now plot the results of a red band.
# Get red band and plot example
b650 = np.nanargmin(abs(ds['wavelengths'].data-650))
ds.isel(bands=b650).hvplot.image(cmap='viridis', aspect = 'equal', frame_width=500, rasterize=True)
Now open a dataset with ortho=False. The main difference here outside of the data not being orthocorrected is that there are no coordinates associated with this xarray.dataset, all data from the netCDF are included, but are treated as variables.
ds = emit_xarray(fp,ortho=False)
ds
<xarray.Dataset>
Dimensions: (downtrack: 1280, crosstrack: 1242, bands: 285, ortho_y: 2009,
ortho_x: 2353)
Dimensions without coordinates: downtrack, crosstrack, bands, ortho_y, ortho_x
Data variables:
reflectance (downtrack, crosstrack, bands) float32 ...
lat (downtrack, crosstrack) float64 ...
lon (downtrack, crosstrack) float64 ...
elev (downtrack, crosstrack) float64 ...
glt_x (ortho_y, ortho_x) float64 ...
glt_y (ortho_y, ortho_x) float64 ...
wavelengths (bands) float32 ...
fwhm (bands) float32 ...
Attributes: (12/38)
ncei_template_version: NCEI_NetCDF_Swath_Template_v2.0
summary: The Earth Surface Mineral Dust Source ...
keywords: Imaging Spectroscopy, minerals, EMIT, ...
Conventions: CF-1.63
sensor: EMIT (Earth Surface Mineral Dust Sourc...
instrument: EMIT
... ...
southernmost_latitude: -40.39610428069674
spatialResolution: 0.000542232520256367
spatial_ref: GEOGCS["WGS 84",DATUM["WGS_1984",SPHER...
geotransform: [-6.25120945e+01 5.42232520e-04 -0.00...
day_night_flag: Day
title: EMIT L2A Surface Reflectance 60 m V001We can see the difference in structure and the plotted data if the dataset is orthorectified or not.
# Get red band and plot example
b650 = np.nanargmin(abs(ds['wavelengths'].data-650))
ds.isel(bands=b650).hvplot.image(cmap='viridis', aspect = 'equal', frame_width=500, rasterize=True)
The orthorectified dataset can also be saved as a netCDF4 output that can be reopened using the xarray.open_dataset function if so desired using the cell below.
ds.to_netcdf('../data/example_emit_ortho_out.nc')
# Example for Opening
# ds = xr.open_dataset('../data/example_emit_ortho_out.nc')
As mentioned, the orthorectification process for EMIT data utilizes a premade geometric lookup table (GLT) that is included in the location group of the file. glt_x and glt_y contain the pixel indices from the raw dataset to construct an orthocorrected array. These indices data can be pulled from the raw dataset to populate an orthocorrected array using numpy broadcasting.
To do this, first read in the location group data.
loc = xr.open_dataset(fp, engine = 'h5netcdf', group='location')
Next, build a numpy array using the GLT arrays from the location group. Use the stack function to stack the x and y GLT arrays and use the function nan_to_num to set the NaN values to the GLT_NODATA_VALUE, which is 0 for EMIT datasets. After doing this, you can check the shape of the array to see the expected height and width in pixels of the image after orthorectification.
GLT_NODATA_VALUE=0
glt_array = np.nan_to_num(np.stack([loc['glt_x'].data,loc['glt_y'].data],axis=-1),nan=GLT_NODATA_VALUE).astype(int)
glt_array.shape
(2009, 2353, 2)
After we have a GLT array, we need to open and create a numpy array from the reflectance dataset. Open the dataset using xarray and assign the data to an array. Here you can check the shape of the array to see the dimensions of the uncorrected data.
ds = xr.open_dataset(fp,engine = 'h5netcdf')
ds_array = ds['reflectance'].data
ds_array.shape
(1280, 1242, 285)
Now that we have both arrays, we need to assign a fill value for positions that have no data. We can then construct an empty array with the dimensions we desire (GLT shape) and populate it with values from the dataset. To do this we first use np.zeros to create an array of all zeros then add the fill_value to the array of zeros to change the zeros to the fill_value.
Next we can build an array of valid locations by omitting the portions of the array containing the GLT_NODATA_VALUES.
After that, change the indexing of the GLT array which is one based to zero based by subtracting 1 from them.
Lastly, we can use broadcasting/indexing to pull through the values we want from the dataset into our new output array.
# Build Output Dataset
fill_value = -9999
out_ds = np.zeros((glt_array.shape[0], glt_array.shape[1], ds_array.shape[-1]), dtype=np.float32) + fill_value
valid_glt = np.all(glt_array != GLT_NODATA_VALUE, axis=-1)
# Adjust for One based Index
glt_array[valid_glt] -= 1
# Use indexing/broadcasting to populate array cells with 0 values
out_ds[valid_glt, :] = ds_array[glt_array[valid_glt, 1], glt_array[valid_glt, 0], :]
out_ds.shape
(2009, 2353, 285)
After this step, we have an array of values that is orthorectified, but if we want to have a grid of lat/lon values we still need to calculate them using the geotransform. We can retrieve this information from the dataset attributes (ds.attrs['geotransform']).
GT = ds.geotransform
GT
array([-6.25120945e+01, 5.42232520e-04, -0.00000000e+00, -3.93067591e+01,
-0.00000000e+00, -5.42232520e-04])
The geotransform consists of 2 formulas with 6 coefficients used to calculate the xy-grid to project the dataset.
x_geo = GT[0] + x * GT[1] + y * GT[2]
y_geo = GT[3] + x * GT[4] + y * GT[5]
GT[0] - The x-coordinate of the upper-left corner of the upper-left pixel
GT[1] - W-E pixel width
GT[2] - Row rotation (zero for North up images)
GT[3] - The y-coordinate of the upper-left corner of the upper-left pixel
GT[4] - Column rotation (zero for North up images)
GT[5] - N-S pixel height (negative value for a north-up image)
Create empty arrays for the x and y (lon and lat) based on the dimensions of the dataset.
# Create Array for Lat and Lon and fill
dim_x = loc.glt_x.shape[1]
dim_y = loc.glt_x.shape[0]
lon = np.zeros(dim_x)
lat = np.zeros(dim_y)
After we have an empty array with the correct dimensions, we can populate it with values using the geotransform formula to build a lat/lon grid for plotting the orthorectified data. We can remove the GT[2] and GT[4] terms from the equation since our orthorectified image is North up. Calclutate the latitude and longitude for each row (x_dim) and column (y_dim) of the dataset.
for x in np.arange(dim_x):
x_geo = GT[0] + x * GT[1]
lon[x] = x_geo
for y in np.arange(dim_y):
y_geo = GT[3] + y * GT[5]
lat[y] = y_geo
lat, lon
(array([-39.30675915, -39.30730138, -39.30784361, ..., -40.39447758,
-40.39501982, -40.39556205]),
array([-62.51209453, -62.5115523 , -62.51101007, ..., -61.23784811,
-61.23730588, -61.23676365]))
As a last step we can place this and remaining information from the EMIT dataset, such as attributes and wavelengths into an xarray.Dataset if desired. To do this, build dictionaries for the coordinates and variables components of the dataset. In this example, the variable would be reflectance. First, we'll want to read in the sensor_band_parameters group to get the wavelength and fwhm info, then build the dictionaries.
wvl = xr.open_dataset(fp, engine = 'h5netcdf', group='sensor_band_parameters')
coords = {'lat':(['lat'],lat), 'lon':(['lon'],lon), **wvl.variables} ## ** upacks the existing dictionary from the wvl dataset.
data_vars = {'reflectance':(['lat','lon','bands'], out_ds)}
Now build an xarray.Dataset using the dictionaries, then add attributes to the dataset from the non-orthocorrected data. We also can add the crs using the rio.write_crs function, to allow rasterio and rioxarray to recognize the CRS for future use.
Note: the wavelength and fwhm are included as non-dimensional coordinates because they are independent of Lat/Lon.
out_xr = xr.Dataset(data_vars=data_vars, coords=coords, attrs= ds.attrs)
out_xr['reflectance'].attrs = ds['reflectance'].attrs
out_xr.coords['lat'].attrs = loc['lat'].attrs
out_xr.coords['lon'].attrs = loc['lon'].attrs
out_xr.rio.write_crs(ds.spatial_ref,inplace=True) # Add CRS in easily recognizable format
out_xr
<xarray.Dataset>
Dimensions: (lat: 2009, lon: 2353, bands: 285)
Coordinates:
* lat (lat) float64 -39.31 -39.31 -39.31 ... -40.39 -40.4 -40.4
* lon (lon) float64 -62.51 -62.51 -62.51 ... -61.24 -61.24 -61.24
wavelengths (bands) float32 ...
fwhm (bands) float32 ...
spatial_ref int32 0
Dimensions without coordinates: bands
Data variables:
reflectance (lat, lon, bands) float32 -9.999e+03 -9.999e+03 ... -9.999e+03
Attributes: (12/38)
ncei_template_version: NCEI_NetCDF_Swath_Template_v2.0
summary: The Earth Surface Mineral Dust Source ...
keywords: Imaging Spectroscopy, minerals, EMIT, ...
Conventions: CF-1.63
sensor: EMIT (Earth Surface Mineral Dust Sourc...
instrument: EMIT
... ...
southernmost_latitude: -40.39610428069674
spatialResolution: 0.000542232520256367
spatial_ref: GEOGCS["WGS 84",DATUM["WGS_1984",SPHER...
geotransform: [-6.25120945e+01 5.42232520e-04 -0.00...
day_night_flag: Day
title: EMIT L2A Surface Reflectance 60 m V001Now we have an orthorectified dataset with coordinates that can be plotted. We can also mask the fill_values of -9999 to make clear figures.
out_xr = out_xr.where(out_xr['reflectance'] != fill_value)
out_xr.isel(bands=b650).hvplot.image(cmap='viridis', aspect = 'equal', frame_width=500, rasterize=True)
Email: LPDAAC@usgs.gov
Voice: +1-866-573-3222
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹
Website: https://lpdaac.usgs.gov/
Date last modified: 01-04-2023
¹Work performed under USGS contract G15PD00467 for NASA contract NNG14HH33I.